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This brief is filed in response to a Final Office Action issued by the U.S. Patent 
and Trademark Office (the "PTO") in the above-referenced application on July 5, 2005. 

On October 14, 2005, Appellants filed a Notice of Appeal in the above-identified 
patent application fi-om the final rejection of claims 1-43. In accordance with 37 C.F.R. § 41.37, 
this Appeal Brief is submitted in support of the Appeal of the final rejection. The fee for this 
Appeal, as set forth in 37 C.F.R. §41. 20(b)(2), is enclosed. For the reasons set forth below, the 
final rejection of pending claims 1-43 should be reversed. 
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I. REAL PARTY IN INTEREST 

The real party in interest is The Trustees of Columbia University in the City of 
New York, by way of assignment from the named inventors, recorded on August 13, 2001 at 
Reel 012068, Frame 0221. 

IL RELATED APPEALS AND INTERFERENCES 

Appellants and the Appellants' legal representatives are unaware of any appeals 
or interferences related to the present application which will directly affect or be directly affected 
by or have a bearing on the Board's decision in the pending appeal. 

III. STATUS OF CLAIMS 

In the July 5, 2005 Final Office Action, claims 1-43 were rejected under 35 
U.S.C. § 102(e) as allegedly being anticipated by Application Publication No. 2001/0000962 of 
Rajan (hereinafter "Rajan"). Appellants respectfully traverse the rejections of record. 

A copy of all of the pending claims is attached hereto in the Claims Appendix at 

page A-1. 

IV. STATUS OF AMENDMENTS 

Subsequent to the issuance of the Final Official Action dated July 5, 2005, no 
further amendments to the claims have been filed by Appellants. 

V. SUMMARY OF CLAIMED SUBJECT MATTER 

The invention described in the above-identified application is directed to a 
method and system for generating a description record from multimedia information, (e.g.. 
Specification, page 4, lines 24-27). Specifically, the present invention has useful applications in, 
e.g., cataloging, indexing and searching multimedia content, as is described in more detail below. 
(Specification, page 4, line 24 - p. 5, line 12). 
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As defined by independent claim 1, the claimed invention is directed to a system 

for generating a description record from multimedia information, comprising, inter alia,: 

a computer processor, coupled to said at least one multimedia information 
input interface, receiving said multimedia information therefrom, 
processing said multimedia information by performing object extraction 
processing to generate multimedia object descriptions from said 
muhimedia information, and processing said generated multimedia object 
descriptions by object hierarchy processing to generate multimedia object 
hierarchy descriptions indicative of an organization of said object 
descriptions, wherein at least one description record including said 
multimedia object descriptions and said multimedia object hierarchy 
descriptions is generated for content embedded within said multimedia 
information; and 

(Claim 1). 

Importantly, the claimed invention includes the limitation of "performing object 
extraction processing to generate multimedia object descriptions from said multimedia 
information, and processing said generated multimedia object descriptions by object hierarchy 
processing to generate multimedia object hierarchy descriptions indicative of an organization of 
said object descriptions, wherein at least one description record including said multimedia object 
descriptions and said multimedia object hierarchy descriptions is generated for content 
embedded within said multimedia information." (Claim 1) (See, e.g., Specification pp. 11-14, 
26-29; Figs. 2, 7,8). Similar limitations are provided in independent method claim 17, 
including, e.g.: 

processing said multimedia information by performing object extraction 
processing to generate multimedia object descriptions from said 
multimedia information; 

processing said generated multimedia object descriptions by object 
hierarchy processing to generate multimedia object hierarchy descriptions 
indicative of an organization of said object descriptions, wherein at least 
one description record including said multimedia object descriptions and 
said multimedia object hierarchy descriptions is generated 

-3- 
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(Claim 17). 

and in independent computer-readable medium claim 33, which includes, inter alia,: 

one or more multimedia object descriptions, generated by performing 
object extraction processing, said object descriptions describing 
corresponding multimedia objects; 

one or more features characterizing each of said multimedia object 
descriptions; 

one or more multimedia object hierarchy descriptions indicative of an 
organization of said object descriptions, if any, relating at least a portion 
of said one or more multimedia objects in accordance with one or more 
characteristics. 

(Claim 33). 

It is an object of the claimed invention to solve a problem in the art regarding the 
indexing, classification and searching of multimedia content. The contents of textual 
information can be easily searched using prior art systems such as internet search engines (e.g., 
Yahoo!, Google, etc.) and other text search systems. The claimed invention facilitates this same 
sort of content search functionality for collections of multimedia information, such as pictures 
and video. (The Background of the Invention describes attempts in the prior art to provide 
multimedia databases which permit users to search for pictures using characteristics such as 
color, texture, and shape information of objects embedded in a picture, but nothing to perform a 
general search of, e.g., the Internet or other computer networks for multimedia content 
(Specification, p. 2, lines 2-7)). The present invention relates to a description record which 
contains descriptive information regarding multimedia information (e.g., descriptions of what is 
shown in digital video or in a digital picture). These description records are useful for 
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categorizing, indexing, and searching collections of multimedia information based on the 
contents of that information. 

VL GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

Claims 1-43 were rejected under 35 U.S.C. § 102(e) as allegedly being anticipated 

by Application Publication No. 2001/0000962 of Rajan (hereinafter "Rajan"). Appellants 

respectfully request review of all rejections of record. 

VII, ARGUMENT 

A, The Rejections Under 35 U.S.C. § 102(e) in view of Rajan Should Be 
Reversed 

In the July 5, 2005 Final Office Action, claims 1-43 were rejected under 35 
U.S.C. § 102(e) as allegedly being anticipated by Application Publication No. 2001/0000962 of 
Rajan (hereinafter "Rajan"). Appellants respectfully traverse the rejections of record. 

1. Relevant Case Law 

To establish an anticipation rejection, the cited reference must teach every 
element of the claimed invention. 35 U.S.C. § 102(e) states, in pertinent part, that "[a] person 
shall be entitled to a patent unless the invention was described in (1) an application for patent . . . 
by another filed in the United States before the invention by the applicant for patent." A patent 
claim is thus anticipated under Section 102 if, among other things, "identity of invention" is 
shown. Minnesota Mining and Manufacturing Co. v. Johnson & Johnson Orthopedics, Inc, 976 
F.2d 1559, 1565, 24 U.S.P.Q.2d 1321 (Fed. Cir. 1985). In fmding identity of invention, one 
"must show that each element of the claim in issue is found ... in a single prior art reference." 
Id. The Federal Circuit has held that, "[a] claim is anticipated only if each and every element as 
set forth in the claim is found, either expressly or inherently described, in a single prior art 
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reference." Verdegaal Bros. v. Union Oil Co, of California, 814 F.2d 628, 631, 2 U.S.P.Q.2d 

1051 (Fed. Cir. 1987). Moreover, "[a] prior art publication cannot be modified by the 

knowledge of those skilled in the art for purposes of anticipation." In re Saunders, F.2d 599, 

602-03, 170 U.S.P.Q. 213, 444 (C.C.P.A. 1971). 

2. Rajan's Priority Document Does Not Disclose the Text Cited By the 
Examiner 

Rajan is a continuation of International Patent Application PCT/US99/14306, 
filed on June 24, 1999 (the "International Rajan AppHcation"). The present appUcation claims 
priority to Provisional Patent Application Serial No. 60/107,463, filed on November 6, 1998, 
which is earlier than the priority date of International Rajan Application. The International 
Rajan Application was based on Provisional Patent Application Serial No. 60/090,845, filed on 
June 26, 1998 (the "Rajan Provisional"). 

Accordingly, because the present application's priority date pre-dates the filing 
date of the International Rajan Application, but post-dates the Rajan Provisional, all portions of 
Rajan cited in the Final Office Action to reject the claims of the present invention must be 
supported in the Rajan Provisional. Several cited portions are not. 

In the Final Office Action, the Examiner cites Figure 1 and Tj| 0040-0045 of 
Rajan as forming the basis all rejections of record. (Final Office Action, pp. 2-4). However, 
neither Figure 1 of Rajan. nor 0040-0045 of the specification of Rajan. was included in the 
Rajan Provisional . A far less detailed and sophisticated version of Figure 1 was included in the 
Rajan Provisional, which figure omits many of the details included in Figure 1 of Rajan. 
Furthermore, the description provided in the Rajan Provisional is less extensive than that of 
Rajan, and omits the paragraphs cited by the Examiner. 
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Accordingly, for at least this reason, Appellants respectfully request reversal of all 
rejections of record. 



3. Rajan Does Not Disclose "performing object extraction processing to 
generate multimedia object descriptions'^ 

Independent claim 1 is directed to a system for generating a description record 

from multimedia information, comprising, inter alia: 

a computer processor, coupled to said at least one multimedia information 
input interface, receiving said multimedia information therefrom, 
processing said multimedia information by performing object extraction 
processing to generate multimedia object descriptions from said 
multimedia information, and processing said generated multimedia object 
descriptions by object hierarchy processing to generate multimedia object 
hierarchy descriptions indicative of an organization of said object 
descriptions, wherein at least one description record including said 
multimedia object descriptions and said multimedia object hierarchy 
descriptions is generated for content embedded within said multimedia 
information 

(Claim 1). 

By way of background, the present invention relates to the MPEG-7 standard, 
which comprises techniques for describing and organizing multimedia information (in fact, the 
inventors of the present invention contributed to the development of that standard through 
participation in a standards-setting body) {See Specification, p. 2, lines 1 1-30). As described in 
the Background of the Invention (starting at p. 1 of the Specification), the prior art provides 
means for searching textual information, both on the internet and locally. However, at the time 
of the present invention, there was no means for searching multimedia content. An aim of 
MPEG-7 is to process multimedia such as video data to extract information about what is shown 
in the video and provide descriptions that may later aid in searching or cataloging the video. 

-7- 
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''Performing object extraction processing to generate multimedia object descriptions,'' as recited 
in the independent claims of the present application (and, by virtue of dependency, as is included 
in all of the dependent claims), is an important procedure in accomplishing these and other 
objects of the invention. {See, e.g.. Specification, p. 26; Figs. 7 and 8). 

Rajan is directed to a method and apparatus for composing and presenting 
multimedia programs (which is the province of a different standard - the MPEG-4 standard) at a 
multimedia terminal, including an architecture wherein the composition of a multimedia scene 
and its presentation are processed by two different entities - a "composition engine" and a 
"presentation engine." See Rajan, \ 0002. The MPEG-4 standard generally "allows a user to 
interact with video and audio objects within a scene," and allows a user to modify scenes by 
deleting, adding, or repositioning objects, or changing the characteristics of objects, such as size, 
color, and shape, for example. See Rajan, \ 0004. Accordingly, generally speaking, Rajan is a 
different system directed to a different problem, i.e., composing and presenting multimedia 
video, from that of the present invention, which is instead directed to techniques for describing 
multimedia information content, e.g., to enable intelligent searching of multimedia content via, 
e.g., the Internet. See Specification, p.l , lines 1-4, p. 9, lines 23-29. Indeed, this distinction is 
inherent in the differences between the subject matter of the Rajan reference (e.g., MPEG-4 
video composition and presentation) and the subject matter of the present invention (e.g., MPEG- 
7 video description), and would be immediately understood by one of ordinary skill in the art. 

Rajan does not disclose or suggest at least the claimed "performing object 

extraction processing to generate multimedia object descriptions." Indeed, the lack of such 

disclosure in Rajan is not surprising, since although ''object extraction processing to generate 

multimedia object descriptions" is a key step in the present invention (in order to extract 
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information from a multimedia signal, such as a picture or video, to describe the contents of the 
picture or video), it is entirely unnecessary for the purposes of MPEG-4 and Rajan (which are 
directed to multimedia composition and presentation, and not extraction). 

The Examiner, on pp. 4-6 of the Final Office Action, maintains that 0042 - 
0046 of Rajan disclose all elements of claim 1. Appellants respectfully disagree. 

In particular, the Examiner alleges that ^ 0042 of Rajan discloses the claimed 

object extraction (Final Office Action, p. 3). However, 0042 of Rajan states: 

According to the MPEG-4 Systems standard, the scene description 
information is coded into a binary format known as BIFS (Binary Format 
for Scene). This BIFS data is packetized and multiplexed at a transmission 
site, such as a cable and or satellite television headend, or a server in a 
computer network, before being sent over a communication channel to a 
terminal 100. The data may be sent to a single terminal or to a terminal 
population. Moreover, the data may be sent via an open-access network or 
via a subscriber network. 

This portion of Rajan is directed to BIFS (Binary Format for Scene) coding, a technique for data 
transmission described in the MPEG-4 standard and which has no relation whatsoever to the 
object extraction for generating multimedia descriptions of the claimed invention. In particular, 
the Examiner refers to "scene description information" as allegedly meeting this claim limitation. 
It does not. The "scene description information" mentioned in Rajan is used in conjunction with 
BIFS to compose and transmit scene information for composition and presentation of a scene. 
This is entirely different from the claimed invention, which receives as an input multimedia 
information (which is already created) and, from that information, using, e.g., object extraction 
processing, extracts information about the multimedia information to generate multimedia object 
descriptions about that input multimedia information. 
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Furthermore, the claimed object extraction processing to generate multimedia 

object descriptions is not disclosed nor suggested anywhere else in Raj an. For at least this 

reason, Appellants respectfully submit that Raj an fails to disclose or suggest all elements of 

independent claims 1,17 and 33 and, accordingly, cannot properly anticipate the claimed 

invention. Furthermore, because all dependent claims in this application depend ultimately from 

one of these independent claims, Appellants submit that Raj an likewise cannot anticipate the 

dependent claims for at least the foregoing reasons. Appellants respectfully submit that this 

alone is sufficient basis to reverse all rejections of record. 

4. Rajan Does Not Disclose ^^processing said generated multimedia 
object descriptions by object hierarchy processing to generate 
multimedia object hierarchy descriptions" 

In addition to the limitations described above, claim 1 also includes "processing 
said generated multimedia object descriptions by object hierarchy processing to generate 
multimedia object descriptions." As discussed above, because Rajan fails to disclose or suggest 
generating "multimedia object descriptions," Rajan cannot possibly disclose or suggest 
''processing said generated multimedia object descriptions. For at least this reason, this 
additional limitation of claim 1 is not disclosed or suggested by Rajan. 

Furthermore, as discussed above, Raj an' s priority document (the Rajan 
Provisional) fails to support the cited portion of Rajan which forms the basis for the Examiner's 
rejections in this respect. Specifically, the Examiner relies on ^ 0043 of Rajan as allegedly 
disclosing "processing said generated multimedia object descriptions by object hierarchy 
processing to generate multimedia object hierarchy descriptions," presumably based on the 
following text of Rajan: 
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The scene description information describes the logical structure of a 
scene, and indicates how objects are grouped together. Specifically, an 
MPEG-4 scene follows a hierarchical structure, which can be 
represented as a directed acyclic (tree) graph, where each node or a group 
of nodes, of the graph, represents a media object. 

Rajan,t0043. 

This lone reference in Raj an to any alleged "hierarchy" is not supported in the Raj an priority 
document (i.e., the Rajan Provisional), which is the only document to pre-date the priority date 
of the present application. Rajan therefore cannot be said to disclose or suggest any type of 
hierarchy for purposes of prior art to the present application, let alone the claimed "processing 
said generated multimedia object descriptions by object hierarchy processing to generate 
multimedia object descriptions." For at least this independent reason. Appellants respectfully 
request reversal of all rejections of record. 

Additionally, while ^ 0043 generally discloses that an "MPEG-4 scene follows a 
hierarchical structure," it nowhere indicates that multimedia objection descriptions, which are 
generated by, e.g., performing object extraction, are processed by object hierarchy processing. 

Furthermore, T| 0043 contains the only reference to any "hierarchical structure" in 
the entirety of Rajan. The "hierarchical structure" mentioned here is not described in detail, and 
the description is certainly is not enabling to one of ordinary skill in the art. To be prior art 
under §102, a reference must put the allegedly anticipating subject matter at issue into the 
possession of the public through an enabling disclosure. See Chester v. Miller, 906 F.2d 1574, 
1576 n.2, 15 U.S.P.Q.2d 1333 (Fed. Cir. 1990). For at least this reason, the claimed "processing 
said generated multimedia object descriptions by object hierarchy processing to generate 
multimedia object hierarchy" is not disclosed for purposes of anticipation. 
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Furthermore, the claimed object hierarchy processing to generate multimedia 
object hierarchy descriptions and this lone instance of "hierarchy" in Rajan are completely 
different. The cited paragraph of Rajan, f 0043, states that "an MPEG-4 scene follows a 
hierarchical structure." This indicates that the "tree structure" discussed in Rajan is for purposes 
of flow of a video scene in space and time. This is further apparent when considered in the 
context of the problem which Rajan is directed to, i.e., composing and presenting multimedia 
video in space and time, Rajan relates to scenes. 

However, the claimed hierarchy, as further described and defined by the 
Applicants, e.g., at pp. 17-20 of the present application, relates to an object hierarchy for 
description of particular video objects with varying levels of specificity - for purposes of content 
description, and not hierarchy of a scene for composing or presenting the scene. The claimed 
object hierarchy processing can produce a "physical hierarchy" and a "logical hierarchy," which 
relate to the physical location of objects in an image, and a higher level hierarchy based on 
semantic descriptions of the objects in the image, respectively. (iSee Specification, p. 17; Fig. 4). 
The object hierarchy descriptions may include semantic information which is useful for 
searching a library of multimedia segments, such as "names of the picture, the names of persons 
in the picture, the location where the picture was taken, the event that is represented by the 
picture, the date of the picture, color features... ." (Specification, p. 20). 

Accordingly, because Rajan fails to disclose or suggest at least these additional 
claimed features, Rajan fails to anticipate independent claims 1,17 and 33. Additionally, 
because all dependent claims contain the foregoing limitations through dependency from the 
independent claims, Appellants respectfiilly submit that the rejections of record should be 
reversed as to all claims. 
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5. Raj an Does Not Disclose "feature extraction" 

Claims 3, 7, 10, 15, 19, 23, 26 and 31 are not anticipated by Rajan by virtue of 
their dependency from claim 1, and for the reasons discussed above. Additionally, these claims 
include the further limitation of "feature extraction." (See Specification, p. 26). Regardless of 
the outcome with respect to the independent claims, claims 3, 7, 10, 15, 19, 23, 26 and 31 are 
independently patentable over Rajan for the additional reason that Rajan fails to disclose or 
suggest "feature extraction." 

The Examiner asserts that the following Tf 0045 of Rajan describes this claimed 

feature: 

The scene description information can also indicate attribute value 
selection. Individual media objects and scene description nodes expose a 
set of parameters to a composition layer through which part of their 
behavior can be controlled. Examples include the pitch of a sound, the 
color for a synthetic object, activation or deactivation of enhancement 
information for scaleable coding, and so forth. 

Rajan, 1 0045. 

Appellants cannot find any reference in the above-cited paragraph to the claimed 
"feature extraction." Appellants can only guess that the Examiner refers to, e.g., "the pitch of a 
sound, the color for a synthetic object" and the like as being "features," and that, as a result, this 
somehow equates to the feature extraction of the claimed invention. Appellants respectfully 
disagree. Again, as Rajan relates to composing and presenting multimedia, it has no need to 
"extract" information about that multimedia it has composed. The cited paragraph of Rajan in 
particular relates to controlling the behavior/composition of objects in a scene, and not extracting 
information from multimedia. Accordingly, for at least this additional reason, Appellants 
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respectfully submit that the rejections of claims 3, 7, 10, 15, 19, 23, 26 and 31, and the claims 

which depend from them, should be reversed. 

Further regarding claims 3,7, 19 and 23, on p. 10 of the Final Office Action, the 

Examiner refers to the following portion of Raj an which allegedly "reads on image segmentation 

and feature extraction": 

4. The MPEG-4 communication standard allows a user to interact 
with video and audio objects within a scene, whether they are from 
conventional sources, such as moving video, or from synthetic (computer 
generated) sources. The user can modify scenes by deleting, adding or 
repositioning objects or changing the characteristics of the objects, such as 
size, color, and shape, for example. 

6. The objects can exist independently, or be joined with other 
objects in a scene in a grouping known as a "composition." Visual objects 
in a scene are given a position in two- or three- dimensional space while 
audio objects can be placed in a sound space. 

8. BIFS commands can add or delete objects from a scene, for 
example, or changed [sic] the visual or acoustic properties of objects. 
BIFS commands also define, update, and position the objects. For 
example, a visual property such as the color or size of an object can be 
changed, or the object can be animated. 

None of these cited paragraphs of Raj an discloses or even remotely suggests image segmentation 

or feature extraction - which is entirely expected, again, since Raj an deals with the problems 

associated with MPEG-4 (intended for composing and presenting multimedia content) and is not 

those addressed by MPEG-7 (intended for extraction of information from multimedia content in 

order to categorize and search the multimedia content). As discussed above, for at least this 

additional reason, Appellants respectfully submit that the rejections in the Final Office Action 

should be reversed. 
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6. The Claims Recite "generation of multimedia object hierarchy" 

Additionally, there are other deficiencies in the Examiner's "Response to 

Arguments," which begins at p. 4 of the Final Office Action. First, on page 8, the Examiner 

states the following: 

"Examiner is not persuaded. In response to applicant's argument that the 
references fail to show certain features of applicant's invention, it is noted 
that the features upon which applicant relies (i.e., generation of 
multimedia object hierarchy ) is not recited in the rejected claim(s)." 

Final Office Action, p. 8. 
However, Appellants refer to text in, e.g., claim 1, which states: "processing said generated 
multimedia object descriptions by object hierarchy processing to generate multimedia object 
hierarchy descriptions." 

In this respect, as discussed above. Raj an and MPEG-4 deal with "hierarchy" in 
the sense of scene descriptions, wherein objects are described in a hierarchical fashion to provide 
composition and presentation information for presenting multimedia content. The present 
invention, however, refers to object hierarchy descriptions, as described throughout the 
specification, including at, e.g., pp. 19-20. The Examiner has alleged that Appellants argued 
features not recited in the rejected claims. However, this feature is recited in several of the 
rejected claims. 

7. The Cited MPEG-4 Reference Is Not Evidence Of The State Of The 
Art 

Additionally, though it apparently did not form the basis for any rejections, the 

Examiner relies on a 1999 MPEG-4 publication as alleged evidence of the state of the art at the 

time the present application was filed. {See Final Office Action, p. 9). However, the present 

application claims priority to November 6, 1998. For at least this reason, the cited reference is 
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not a valid indicator of the state of the art at the time of the present invention. Additionally, as 
explained in detail above, the cited reference refers to MPEG-4 technology, which is distinct 
from and unrelated to the claimed invention. 
B. Conclusion 

For at least the reasons indicated above. Appellants respectfully submit that the 
invention recited in the claims of the present application, as discussed above, is not anticipated 
by the cited prior art. Reversal of the Examiner's rejections of the claims is therefore 
respectfully requested. 

Respectfully submitted, 
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VIII. CLAIMS APPENDIX 

Claims 1-43 are pending in this application: 

1 . (Original) A system for generating a description record from multimedia 
information, comprising: 

(a) at least one multimedia information input interface receiving said 
multimedia information; 

(b) a computer processor, coupled to said at least one multimedia information 
input interface, receiving said multimedia information therefrom, 
processing said multimedia information by performing object extraction 
processing to generate multimedia object descriptions from said 
multimedia information, and processing said generated multimedia object 
descriptions by object hierarchy processing to generate multimedia object 
hierarchy descriptions indicative of an organization of said object 
descriptions, wherein at least one description record including said 
multimedia object descriptions and said multimedia object hierarchy 
descriptions is generated for content embedded within said multimedia 
information; and 

(c) a data storage system, operatively coupled to said processor, for storing 
said at least one description record. 



2. (Original) The system of claim 1 , wherein said multimedia 
information comprises image information, said multimedia object 
descriptions comprise image object descriptions, and said multimedia 
object hierarchy descriptions comprise image object hierarchy 
descriptions. 

3. (Original) The system of claim 2, wherein said object extraction 
processing comprises: 

(a) image segmentation processing to segment each image in said image 
information into regions within said image; and 
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(b) feature extraction processing to generate one or more feature descriptions 

for one or more of said regions; 
whereby said generated object descriptions comprise said one or more feature 

descriptions for one or more of said regions. 

4. (Original) The system of claim 3, wherein said one or more feature 
descriptions are selected from the group consisting of text annotations, 
color, texture, shape, size, and position. 

5. (Original) The system of claim 2, wherein said object hierarchy 
processing comprises physical object hierarchy organization to generate 
physical object hierarchy descriptions of said image object descriptions 
that are based on spatial characteristics of said objects, such that said 
image object hierarchy descriptions comprise physical descriptions. 

6. (Original) The system of claim 5, wherein said object hierarchy 
processing further comprises logical object hierarchy organization to 
generate logical object hierarchy descriptions of said image object 
descriptions that are based on semantic characteristics of said objects, such 
that said image object hierarchy descriptions comprise both physical and 
logical descriptions. 

7. (Original) The system of claim 6, wherein said object extraction 
processing comprises: 

(a) image segmentation processing to segment each image in said image 
information into regions within said image; and 

(b) feature extraction processing to generate object descriptions for one or 
more of said region; 

and wherein said physical hierarchy organization and said logical hierarchy 

organization, generate hierarchy descriptions of said object descriptions 
for said one or more of said regions. 

8. (Original) The system of claim 7, further comprising an encoder 
receiving said image object hierarchy descriptions and said image object 

A-2 

NY02:534217.1 



A32095-PCT-USA - 070050.1520 



descriptions, and encoding said image object hierarchy descriptions and 
said image object descriptions into encoded description information, 
wherein said data storage system is operative to store said encoded 
description information as said at least one description record. 

9. (Original) The system of claim 1, wherein said multimedia 
information comprises video information, said multimedia object 
descriptions comprise video object descriptions including both event 
descriptions and object descriptions, and said multimedia hierarchy 
descriptions comprise video object hierarchy descriptions including both 
event hierarchy descriptions and object hierarchy descriptions. 

10. (Original) The system of claim 9, wherein said object extraction 
processing comprises: 

(a) temporal video segmentation processing to temporally segment said video 
information into one or more video events or groups of video events and 
generate event descriptions for said video events, 

(b) video object extraction processing to segment said one or more video 
events or groups of video events into one or more regions, and to generate 
object descriptions for said regions; and 

(c) feature extraction processing to generate one or more event feature 
descriptions for said one or more video events or groups of video events, 
and one or more object feature descriptions for said one or more regions; 

wherein said generated video object descriptions include said event feature 
descriptions and said object descriptions. 

1 1 . (Original) The system of claim 10, wherein said one or more event 
feature descriptions are selected from the group consisting of text 
annotations, shot transition, camera motion, time and key frame, and 
wherein said one or more object feature descriptions are selected from the 
group consisting of color, texture, shape, size, position, motion, and time. 

12. (Original) The system of claim 9, wherein said object hierarchy 
processing comprises physical event hierarchy organization to generate 
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physical event hierarchy descriptions of said video object descriptions that 
are based on temporal characteristics of said video objects, such that said 
video hierarchy descriptions comprise temporal descriptions. 

13. (Original) The system of claim 12, wherein said object hierarchy 
processing further comprises logical event hierarchy organization to 
generate logical event hierarchy descriptions of said video object 
descriptions that are based on semantic characteristics of said video 
objects, such that said hierarchy descriptions comprise both temporal and 
logical descriptions. 

14. (Original) The system of claim 13, v^herein said object hierarchy 
processing further comprises physical and logical object hierarchy 
extraction processing, receiving said temporal and logical descriptions and 
generating object hierarchy descriptions for video objects embedded 
within said video information, such that said video hierarchy descriptions 
comprise temporal and logical event and object descriptions. 

15. (Original) The system of claim 14, wherein said object extraction 
processing comprises: 

(a) temporal video segmentation processing to temporally segment said video 
information into one or more video events or groups of video events and 
generate event descriptions for said video events, 

(b) video object extraction processing to segment said one or more video 
events or groups of video events into one or more regions, and to generate 
object descriptions for said regions; and 

(c) feature extraction processing to generate one or more event feature 
descriptions for said one or more video events or groups of video events, 
and one or more object feature descriptions for said one or more regions; 

wherein said generated video object descriptions include said event feature 

descriptions and said object descriptions, and wherein said physical event 
hierarchy organization and said logical event hierarchy organization 
generate hierarchy descriptions from said event feature descriptions, and 
wherein said physical object hierarchy organization and said logical object 
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hierarchy organization generate hierarchy descriptions from said object 
feature descriptions 

16. (Original) The system of claim 15, further comprising an encoder 
receiving said video object hierarchy descriptions and said video object 
descriptions, and encoding said said video object hierarchy descriptions 
and said video object descriptions into encoded description information, 
v^herein said data storage system is operative to store said encoded 
description information as said at least one description record. 

1 7. (Original) A method for generating a description record from 
multimedia information, comprising the steps of: 

(a) receiving said muhimedia information; 

(b) processing said multimedia information by performing object extraction 
processing to generate multimedia object descriptions from said 
multimedia information; 

(c) processing said generated multimedia object descriptions by object 
hierarchy processing to generate multimedia object hierarchy descriptions 
indicative of an organization of said object descriptions, wherein at least 
one description record including said multimedia object descriptions and 
said multimedia object hierarchy descriptions is generated for content 
embedded within said multimedia information; and 

(d) storing said at least one description record. 

18. (Original) The method of claim 1 7, wherein said multimedia 
information comprises image information, said multimedia object 
descriptions comprise image object descriptions, and said multimedia 
object hierarchy descriptions comprise image object hierarchy 
descriptions. 

1 9. (Previously amended) The method of claim 1 8, wherein said object 
extraction processing step comprises the sub-steps of: 
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(a) image segmentation processing to segment each image in said image 
information into regions within said image; and 

(b) feature extraction processing to generate one or more feature descriptions 
for one or more of said regions; 

whereby said generated image object descriptions comprise said one or more 
feature descriptions for one or more of said regions. 

20. (Original) The method of claim 19, wherein said one or more feature 
descriptions are selected from the group consisting of text annotations, 
color, texture, shape, size, and position. 

2 1 . (Original) The method of claim 1 8, wherein said step of object 
hierarchy processing includes the sub-step of physical object hierarchy 
organization to generate physical object hierarchy descriptions of said 
image object descriptions that are based on spatial characteristics of said 
objects, such that said image hierarchy descriptions comprise physical 
descriptions. 

22. (Original) The method of claim 2 1 , said step of object hierarchy 
processing further includes the sub-step of logical object hierarchy 
organization to generate logical object hierarchy descriptions of said 
image object descriptions that are based on semantic characteristics of said 
objects, such that said image object hierarchy descriptions comprise both 
physical and logical descriptions. 

23. (Original) The method of claim 22, wherein said step of object 
extraction processing further includes the sub-steps of: 

(a) image segmentation processing to segment each image in said image 
information into regions within said image; and 

(b) feature extraction processing to generate object descriptions for one or 
more of said region; 

and wherein said physical object hierarchy organization sub-step and said logical 
object hierarchy organization sub-step generate hierarchy descriptions of 
said object descriptions for said one or more of said regions. 
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24. (Previously presented) The method of claim 1 8, further comprising 
the step of encoding said image object descriptions and said image object 
hierarchy descriptions into encoded description information prior to said 
data storage step. 

25. (Original) The method of claim 17, wherein said muhimedia 
information comprises video information, said multimedia object 
descriptions comprise video object descriptions including both event 
descriptions and object descriptions, and said multimedia hierarchy 
descriptions comprise video object hierarchy descriptions including both 
event hierarchy descriptions and object hierarchy descriptions. 

26. (Original) The method of claim 25, v^herein said step of object 
extraction processing comprises the sub-steps of: 

(a) temporal video segmentation processing to temporally segment said video 
information into one or more video events or groups of video events and 
generate event descriptions for said video events, 

(b) video object extraction processing to segment said one or more video 
events or groups of video events into one or more regions, and to generate 
object descriptions for said regions; and 

(c) feature extraction processing to generate one or more event feature 
descriptions for said one or more video events or groups of video events, 
and one or more object feature descriptions for said one or more regions; 

wherein said generated video object descriptions include said event feature 
descriptions and said object descriptions. 

27. (Original) The method of claim 26, wherein said one or more event 
feature descriptions are selected from the group consisting of text 
annotations, shot transition, camera motion, time and key frame, and 
wherein said one or more object feature descriptions are selected from the 
group consisting of color, texture, shape, size, position, motion, and time. 
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28. (Original) The method of claim 25, wherein said step of object 
hierarchy processing includes the sub-step of physical event hierarchy 
organization to generate physical event hierarchy descriptions of said 
video object descriptions that are based on temporal characteristics of said 
video objects, such that said video hierarchy descriptions comprise 
temporal descriptions. 

29. (Original) The method of claim 28, wherein said step of object 
hierarchy processing further includes the sub-step of logical event 
hierarchy organization to generate logical event hierarchy descriptions of 
said video object descriptions that are based on semantic characteristics of 
said video objects, such that said hierarchy descriptions comprise both 
temporal and logical descriptions. 

30. (Original) The method of claim 29, wherein said step of object 
hierarchy processing further comprises the sub-step physical and logical 
object hierarchy extraction processing, receiving said temporal and logical 
descriptions and generating object hierarchy descriptions for video objects 
embedded within said video information, such that said video hierarchy 
descriptions comprise temporal and logical event and object descriptions.. 

3 1 . (Original) The method of claim 30, wherein said step of object 
extraction processing comprises the sub-steps of: 

(a) temporal video segmentation processing to temporally segment said video 
information into one or more video events or groups of video events and 
generate event descriptions for said video events, 

(b) video object extraction processing to segment said one or more video 
events or groups of video events into one or more regions, and to generate 
object descriptions for said regions; and 

(c) feature extraction processing to generate one or more event feature 
descriptions for said one or more video events or groups of video events, 
and one or more object feature descriptions for said one or more regions; 

wherein said generated video object descriptions include said event feature 

descriptions and said object descriptions, and wherein said physical event 
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hierarchy organization and said logical event hierarchy organization 
generate hierarchy descriptions from said event feature descriptions, and 
wherein said physical object hierarchy organization and said logical object 
hierarchy organization generate hierarchy descriptions from said object 
feature descriptions. 

32. (Previously presented) The method of claim 3 1 , further comprising 
the step of encoding said video object descriptions and said video object 
hierarchy descriptions into encoded description information prior to said 
data storage step. 

33 . (Previously presented) A computer readable media containing 
digital information with at least one multimedia description record 
describing multimedia content for corresponding multimedia information, 
the description record comprising: 

(a) one or more multimedia object descriptions, generated by 
performing object extraction processing, said object descriptions 
describing corresponding multimedia objects; 

(b) one or more features characterizing each of said multimedia object 
descriptions; and 

(c) one or more multimedia object hierarchy descriptions indicative of 
an organization of said object descriptions, if any, relating at least a 
portion of said one or more multimedia objects in accordance with one or 
more characteristics. 

34. (Original) The computer readable media of claim 33, wherein said 
multimedia information comprises image information, said multimedia 
objects comprise image objects, said multimedia object descriptions 
comprise image object descriptions, and said multimedia object hierarchy 
descriptions comprise image object hierarchy descriptions. 
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35. (Original) The computer readable media of claim 34, wherein said one 
or more features are selected from the group consisting of text annotations, 
color, texture, shape, size, and position. 

36. (Original) The computer readable media of claim 34, wherein said 
image object hierarchy descriptions comprise physical object hierarchy 
descriptions of said image object descriptions based on spatial 
characteristics of said image objects. 

37. (Original) The computer readable media of claim 36, wherein said 
image object hierarchy descriptions further comprises logical object 
hierarchy descriptions of said image object descriptions based on semantic 
characteristics of said image objects. 

38. (Original) The computer readable media of claim 33, wherein said 
multimedia information comprises video information, said multimedia 
objects comprise events and video objects, said multimedia object 
descriptions comprise video object descriptions including both event 
descriptions and object descriptions, said features comprise video event 
features and video object features, and said multimedia hierarchy 
descriptions comprise video object hierarchy descriptions including both 
event hierarchy descriptions and object hierarchy descriptions . 

39. (Original) The computer readable media of claim 38, wherein said one 
or more event feature descriptions are selected from the group consisting 
of text annotations, shot transition, camera motion, time and key frame, 
and wherein said one or more object feature descriptions are selected from 
the group consisting of color, texture, shape, size, position, motion, and 
time.. 

40. (Original) The computer readable media of claim 38, wherein said 
event hierarchy descriptions comprise one or more physical hierarchy 
descriptions of said events based on temporal characteristics. 
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41 . (Original) The computer readable media of claim 40, wherein said 
event hierarchy descriptions further comprise one or more logical 
hierarchy descriptions of said events based on semantic characteristics. 

42. (Original) The computer readable media of claim 38, wherein said 
object hierarchy descriptions comprise one or more physical hierarchy 
descriptions of said objects based on temporal characteristics. 

43. (Original) The computer readable media of claim 39, wherein said 
object hierarchy descriptions further comprise one or more logical 
hierarchy descriptions, of said objects based on semantic characteristics. 
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IX. EVIDENCE APPENDIX 

None. 
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X. RELATED PROCEEDINGS APPENDIX 

None. 
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